Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows fprs preservation #1216

Merged
merged 41 commits into from
Apr 10, 2020

Conversation

iximeow
Copy link
Contributor

@iximeow iximeow commented Mar 3, 2020

Moved from bytecodealliance/cranelift#1378.

There were a few unresolved conversations from the original PR that I'll copy below.

; nextln: store notrap aligned v8, v7
; nextln: store notrap aligned v9, v7+16
; check: v10 = stack_addr.i64 ss0
; nextln: v11 = load.f64 notrap aligned v10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a comment in the original PR referencing a line I couldn't get to test properly here, but would make me happier these tests being robust in the face of the future.

The exact text doesn't match anymore but the same idea applies. I would like to test that preserves and restores have the right registers. In theory a nextln like

; nextln: [Mp2fld#710,%xmm6]                  v11 = load.f64 notrap aligned v10

would do the trick, but this yields a test error like

> [RexOp1spaddr8_id#808d,%rbp]        v10 = stack_addr.i64 ss0
                                      ^~~~~~~~~~~~~~~~~~~~~~~~
Matched #10: \bv10 = stack_addr\.i64 ss0\b
> [Mp2fld#710,%xmm6]                  v11 = load.f64 notrap aligned v10
Missed #11: \[Mp2fld\#710,%xmm6\]      v11 = load\.f64 notrap aligned v10\b

which is suspicious because the only difference on the matching lines is whitespace. When whitespace is equalized, this still yields an error. We use specific checks like this in other filetests also run as test compile, so I suspect this is a bug in the filetest runner?

I would greatly appreciate thoughts if anyone is particularly familiar with filetest.

arkpar pushed a commit to paritytech/wasmtime that referenced this pull request Mar 4, 2020
arkpar pushed a commit to paritytech/wasmtime that referenced this pull request Mar 4, 2020
@iximeow iximeow force-pushed the windows-fprs-preservation branch from fbd067a to b55a43d Compare March 10, 2020 18:53
@iximeow
Copy link
Contributor Author

iximeow commented Mar 11, 2020

The CI failure is due to tripping this check that we wouldn't misreport unwind information in cases where we have callee-save floating-point registers. I tried reproducing this locally by building the module in multi_func_value/func.wast by clif-util wasm funct_module.wast --target x86_64-windows -D, but for some reason that seems to build fine, using a significantly smaller amount of stack space? I ran clif-util from Linux, but the only thing that should be a factor here is the target specification. I'm not sure why it doesn't reproduce this way. Is there a flag I should set that influences lowering to clif as well?

Anyway, I printed the faulting prologue in CI to double-check that the panic is correctly hit, and indeed a function in the module uses well over the 240-byte limit I've noted above:

x86_push.i64 v105
offset 2: copy_special %rsp -> %rbp 
offset 5: x86_push.i64 v106 
offset 7: x86_push.i64 v107 
offset 9: x86_push.i64 v108 
offset 11: x86_push.i64 v109 
offset 13: x86_push.i64 v110 
offset 15: x86_push.i64 v111 
offset 17: adjust_sp_down_imm 336
offset 24: v112 = stack_addr.i64 ss18 
offset 32: store.f64 notrap aligned v113, v112 
offset 37: store.f64 notrap aligned v114, v112+16
offset 43: store.f64 notrap aligned v115, v112+32

Since this function uses callee-save floats, we miscompile it on Windows today. Should I disable this test on Windows? (not sure if there's a mechanism for this in spectests, honestly) It would be fixed in the same PR that adjusts stack layout to not be constrained to 240 byte frames when using callee-save xmm registers.

Edit: this panic is only reached if you want a fastcall function with all of:

  • a >240 byte stack frame
  • callee-save floats
  • unwind information

This PR fixes callee-save float preservation in fastcall functions generally. Asking for unwind information causes the panic. I'm not sure how easily Windows targets can get the former without the latter.

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Mar 11, 2020
@bytecodealliance bytecodealliance deleted a comment from github-actions bot Mar 12, 2020
@bytecodealliance bytecodealliance deleted a comment from github-actions bot Mar 12, 2020
@iximeow
Copy link
Contributor Author

iximeow commented Mar 14, 2020

Nudging this again as I think this is as good as this fix can get without getting into the weeds on stack layout details.

I have three(ish) issues to open assuming a 👍 :

  • Use of an F64 type instead of the more accurate F64x2 for callee-save FPRs, which works but just isn't fully expressive of intent. See here
  • Fix how Cranelift lays out Fastcall stacks to avoid the 240-byte stack frame limit when using unwind information and callee-save XMM registers. See here for conversation and here for a test I've had to disable as a consequence.
  • Might not be worth a tracking issue, but there are two TODO's contingent on an encoding existing that would result in changes here after this lands.

@github-actions
Copy link

Subscribe to Label Action

This issue or pull request has been labeled: "cranelift"

Users Subscribed to "cranelift"

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@iximeow
Copy link
Contributor Author

iximeow commented Mar 26, 2020

Pardon the rebase, just want to get this on a more recent master so as to pick up a fix for nightly builds failing in CI.

@bjorn3
Copy link
Contributor

bjorn3 commented Mar 27, 2020

@iximeow I think you forgot to push :)

@iximeow iximeow force-pushed the windows-fprs-preservation branch from 5c0e249 to e71b7fe Compare March 27, 2020 17:56
@iximeow
Copy link
Contributor Author

iximeow commented Mar 27, 2020

@bjorn3 even worse, I'd pushed to the wrong remote. Thanks for the heads up!

@peterhuene
Copy link
Member

It looks like there's at least a few more tests to disable due to the frame size limitation this adds to functions using Windows x64 calling convention.

@iximeow
Copy link
Contributor Author

iximeow commented Apr 3, 2020

it passes 🎉

GitHub appears to believe that 3be2d1c doesn't exist, so I pushed an empty commit (ea9f77a) to fool CI into running after yesterday's availability issues. After ignoring more tests, it's green again 🥴

@peterhuene
Copy link
Member

@iximeow I'll review this soon and if everyone signs off, I think we should merge this before my unwind refactoring changes. I can easily update that PR to integrate this PR's changes to save the FPRs in the windows unwind info.

Copy link
Member

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me 👍, even with the introduced stack size limit on Windows.

Just a few comments that aren't particularly blocking.

We probably want @bjorn3 or another core cranelift maintainer to sign off as well.

| ("multi_value", "func") => {
// FIXME These involves functions with very large stack frames that Cranelift currently
// cannot compile using the fastcall (Windows) calling convention.
// See https://github.com/bytecodealliance/wasmtime/pull/1216.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a new issue to track the stack layout for CSRs problem and link here instead of the PR? My concern is only that it's hard to make out what's relevant of a complex PR like this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. That's the third item in my to-do list which I've been waiting for even a preliminary ✔️ to actually file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder to update this to https://github.com/bytecodealliance/wasmtime/issues/1475.

cranelift/codegen/src/isa/x86/unwind.rs Outdated Show resolved Hide resolved
@bjorn3
Copy link
Contributor

bjorn3 commented Apr 6, 2020

We probably want @bjorn3 or another core cranelift maintainer to sign off as well.

I am actually not a maintainer. :)

@peterhuene
Copy link
Member

😱 my apologies, although your review is obviously always appreciated :) I'm not actually a core cranelift maintainer either, so I'm more comfortable when others sign off on bits I haven't been directly involved in.

@bjorn3
Copy link
Contributor

bjorn3 commented Apr 6, 2020

😱 my apologies, although your review is obviously always appreciated :)

You are not the first to think I am a maintainer of a project. :) I often review PR's on projects I am (slightly) interested in. And sometimes the PR author assumes I am a maintainer.

@peterhuene
Copy link
Member

I'll ping @bnjbvr for the core maintainer sign-off, then.

@iximeow iximeow force-pushed the windows-fprs-preservation branch from 87b7017 to 4e1b10e Compare April 9, 2020 23:12
@iximeow
Copy link
Contributor Author

iximeow commented Apr 10, 2020

I had to force push to rebase and fix a merge conflict that crept in. Unwind-producing functions are Result-y, which of course resulted in some Wasmtime changes as well. Given the wider scope, I'd like one last +1 before clicking the tempting green button :)

Co-Authored-By: bjorn3 <bjorn3@users.noreply.github.com>
Copy link
Member

@peterhuene peterhuene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on the result return change.

@iximeow iximeow merged commit 4cca510 into bytecodealliance:master Apr 10, 2020
@iximeow iximeow deleted the windows-fprs-preservation branch April 10, 2020 20:28
@peterhuene
Copy link
Member

🎉 thanks for fixing this @iximeow! 🎉 I realize it was a slog of a PR.

Now I get to merge these changes in to my refactoring PR and we'll see about prioritizing the fixing of the stack layout of the register saves 😄

@hrydgard
Copy link
Contributor

Awesome, thanks for getting this done!

CryZe added a commit to CryZe/wasmtime that referenced this pull request Apr 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants